ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling

نویسندگان

چکیده

Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack data has led the rise low-resource methods, that assume low availability natural language processing. Among them, zero-shot learning stands out, consists classifier without any previously data. The best results reported with this approach use models such as Transformers, but fall into two problems: high execution time and inability handle long texts input. paper proposes new model, ZeroBERTo, leverages an unsupervised clustering step obtain compressed representation before task. We show ZeroBERTo better performance for inputs shorter time, outperforming XLM-R by about 12% F1 score FolhaUOL dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence’s tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets p...

متن کامل

Short Text Understanding by Leveraging Knowledge into Topic Model

In this paper, we investigate the challenging task of understanding short text (STU task) by jointly considering topic modeling and knowledge incorporation. Knowledge incorporation can solve the content sparsity problem effectively for topic modeling. Specifically, the phrase topic model is proposed to leverage the auto-mined knowledge, i.e., the phrases, to guide the generative process of shor...

متن کامل

Zero-shot Cross Language Text Classifica-

Labeled text classification datasets are typically only available in a few select languages. In order to train a model for e.g news categorization in a language Lt without a suitable text classification dataset there are two options. The first option is to create a new labeled dataset by hand, and the second option is to transfer label information from an existing labeled dataset in a source la...

متن کامل

Multi-label Dataless Text Classification with Topic Modeling

Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-98305-5_12